Objective Optimisation of Automatic Speech-to-Phoneme Alignment Systems

نویسندگان

  • Ladan Baghai-Ravary
  • Greg Kochanski
  • John Coleman
چکیده

This paper presents techniques for objective characterisation of Automatic Speech-to-Phoneme Alignment (ASPA) systems, without the need for human-generated labels to act as a benchmark. As well as being immune to the effects of human variability, these techniques yield diagnostic information which can be helpful in the development of new alignment systems, ensuring that the resulting labels are as consistent as possible. To illustrate this, a total of 48 ASPA systems are used, including three front-end processors. For each processor, the number of states in each phoneme model, and of Gaussian distributions in each state mixture, are adjusted to generate a broad variety of systems. The results are compared using a statistical measure and a model-based Bayesian Monte-Carlo approach. The most consistent alignment system is identified, and is (as expected) in close agreement with typical “baseline” systems used in ASR research.

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

بهبود عملکرد سیستم بازشناسی گفتار پیوسته بوسیله ویژگی‌های استخراج شده از مانیفولدهای گفتاری در فضای بازسازی شده فاز

The design for new feature extraction methods out of the speech signal and combination of their obtained information is one of the most effective approaches to improve the performance of automatic speech recognition (ASR) system. Recent researches have been shown that the speech signal contains nonlinear and chaotic properties, but the effects of these properties are not used in the continuous ...

متن کامل

Allophone-based acoustic modeling for Persian phoneme recognition

Phoneme recognition is one of the fundamental phases of automatic speech recognition. Coarticulation which refers to the integration of sounds, is one of the important obstacles in phoneme recognition. In other words, each phone is influenced and changed by the characteristics of its neighbor phones, and coarticulation is responsible for most of these changes. The idea of modeling the effects o...

متن کامل

Improved HMM/SVM methods for automatic phoneme segmentation

This paper presents improved HMM/SVM methods for a twostage phoneme segmentation framework, which tries to imitate the human phoneme segmentation process. The first stage performs hidden Markov model (HMM) forced alignment according to the minimum boundary error (MBE) criterion. The objective is to align a phoneme sequence of a speech utterance with its acoustic signal counterpart based on MBE-...

متن کامل

Analysis of L2 English speech corpus by automatic phoneme alignment

This study tested the application of adapted HTK for automatic alignment of speech corpus of Asian speakers’ English. The HTK tool with TIMIT has problems in aligning non-native speakers’ English. New sets of phoneme sequences for each word were listed to test if an adapted alignment module could accurately analyze pronunciation of Japanese speakers’ English. The new sets of phoneme sequences p...

متن کامل

Designing and implementing a system for Automatic recognition of Persian letters by Lip-reading using image processing methods

For many years, speech has been the most natural and efficient means of information exchange for human beings. With the advancement of technology and the prevalence of computer usage, the design and production of speech recognition systems have been considered by researchers. Among this, lip-reading techniques encountered with many challenges for speech recognition, that one of the challenges b...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2009